đī¸ Dimensionality Reduction
CSharpNumerics includes unsupervised dimensionality reduction as an optional preprocessing step in both supervised and clustering pipelines. Reducers implement IDimensionalityReducer and slot into the pipeline between feature selection and scaling.
using CSharpNumerics.ML;
đ Pipeline Orderâ
| Pipeline | Order |
|---|---|
| Supervised | Selector â Reducer â Scaler â Model |
| Clustering | Reducer â Scaler â Model |
đĩ Algorithmsâ
Principal Component Analysis (PCA)
Class: PCA
Projects data onto the top eigenvectors of the covariance matrix. Uses power iteration with deflation for eigendecomposition.
Hyperparameters:
NComponentsâ number of output dimensionsMaxIterationsâ power iteration limit (default 1000)Toleranceâ convergence threshold (default 1e-8)Seedâ optional random seed
Exposes after fit: Components, ExplainedVariance, ExplainedVarianceRatio, Mean
đŦ Clustering Pipeline Integrationâ
var experiment = ClusteringExperiment
.For(X)
.WithAlgorithm(new KMeans())
.TryClusterCounts(2, 8)
.WithEvaluator(new SilhouetteEvaluator())
.WithReducer(new PCA { NComponents = 5 })
.WithScaler(new StandardScaler())
.Run();
Console.WriteLine(experiment.BestClusterCount);
đ˛ Clustering Grid Integrationâ
var experiment = ClusteringExperiment
.For(X)
.WithGrid(new ClusteringGrid()
.AddModel<KMeans>(g => g
.Add("K", 2, 3, 4, 5)
.AddReducer<PCA>(r => r.Add("NComponents", 2, 5, 10))
.AddScaler<StandardScaler>(s => { })))
.WithEvaluator(new SilhouetteEvaluator())
.Run();
⥠Supervised Pipeline Integrationâ
var result = SupervisedExperiment
.For(X, y)
.WithGrid(new PipelineGrid()
.AddModel<KNearestNeighbors>(g => g
.Add("K", 3, 5, 7)
.AddReducer<PCA>(r => r.Add("NComponents", 2, 5))
.AddScaler<StandardScaler>(s => { }))
.AddModel<DecisionTree>(g => g
.Add("MaxDepth", 3, 5, 10)))
.WithCrossValidator(CrossValidatorConfig.KFold(folds: 5))
.Run();
đ Key Pointsâ
- Reducers are optional â existing pipelines work unchanged
- PCA uses power iteration â no external dependencies
ExplainedVarianceRatioshows how much variance each component captures- Grid search over
NComponentsfinds the optimal dimensionality automatically - Works with both supervised and clustering pipelines
- Follows the same
FitTransform/Transform/Clonepattern as scalers - Implements
IHasHyperparametersfor grid search integration